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ACHIEVING  GENERALITY  OVER  CONDITIONS: 

COMBINING  THE  MULTITRAIT  MULTIMETHOD  MATRIX  AND  THE  REPRESENTATIVE 

DESIGN  OF  EXPERIMENTS 


y  Doubts  about  the  generality  of  results  produced  by  psychological 

/ 

research  have  been  expressed  with  increasing  frequency  since  Koch  observed, 
after  a  monumental  review  of  scientific  psychology  in  1959,  that  there  is 
6a  stubborn  refusal  of  psychological  findings  to  yield  to  empirical 

■t 

generalization" . (1959,  pp.  729-788).  Brunswik  (1952,  1956),  Campbell  and 
Stanley  (1966),  Cronbach  (1975),  Epstein  (1979,  1980),  Einhorn  and  Hogarth 
(1981),  Greenwald  (1975,  1976),  Hammond  (1966),  Meehl  (1978)  and  Simon 
(1979)  among  others,  have  also  called  attention  to  this  situation  and  some 
(Epstein,  1980;  Greenwald,  1976)  have  referred  to  it  as  a  "crisis."  All 
regard  it  as  a  fundamental,  persistent  problem  in  psychological  research. 

In  an  effort  to  develop  a  methodology  that  will  provide  generality 
without  the  loss  of  rigor,  we  build  upon  two  previous  methodological 
suggestions,  (a)  the  multi  trait  multimethod  matrix  introduced  by  Campbell 
and  Fiske  (1959)  and  (b)  the  representative  design  of  experiments 
introduced  by  Brunswik  (1956).  Data  from  a  study  of  experts  who  were 
required  to  employ  three  modes  of  cognition  in  each  of  three  judgment  tasks 
(see  Appendix  A)  provided  a  unique  opportunity  not  only  to  make  use  of  the 
multitrait  multimethod  matrix,  but  to  extend  it. 
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In  their  1959  study  of  the  field  of  individual  differences,  Campbell 
and  fiske  convincingly  demonstrated  the  faults  of  the  conventional 
single-concept  single-operation  methodology.  The  overwhelming  majority  of 
studies  they  examined  showed  that  results  were  more  likely  to  be  determined 
by  the  methods  employed  by  the  experimenters  than  by  the  traits 
hypothesized  to  account  for  the  results.  Although  they  showed  that  this 
failure  to  separate  the  effects  of  operation  (method)  from  the  effects  of 
concept  (trait)  can  be  both  demonstrated  and  avoided  by  use  of  the 
multitrait  multimethod  matrix,  there  has  been  little  change  in  conventional 
research  methodology. 

The  problem  is  not  that  Campbell  and  Fiske's  work  went  unrecognized. 
It  became  a  milestone  in  the  methodological  literature  of  psychology,  and 
by  1983  had  been  cited  over  1000  times.  Yet  in  spite  of  the  potential  of 
the  multitrait  multimethod  matrix  for  breaking  the  grip  of  a  simpleminded 
operationism  on  psychological  research,  the  method  is  for  the  most  part 
simply  not  used.  Presumably  researchers  have  avoided  it  for  tactical 
reasons,  since  it  introduces  conceptual  complexity  (which  concepts  and 
which  methods  should  be  compared?)  and  requires  considerable  additional 
labor  and  apparatus  within  a  single  study.  Or  perhaps  there  is  general 
unawareness  of  the  ephemeral  character  of  results  produced  by 
single-concept  single-method  operationism.  Whatever  the  reason,  among  tens 
of  thousands  of  studies  of  individual  differences,  Turner  (cited  in  Fiske, 
1981)  found  only  70  published  matrices  between  1967  and  1980  (see  Fiske, 
1981,  for  a  general  review). 
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The  multi  trait  multimethod  matrix  has  probably  never  been  used  in 
experimental  psychology,  although  its  logic  is  equally  applicable  to  that 
field  (cf.  Fiske,  1981).  We  examined  the  62  articles  in  Volume  9  (1983)  of 
the  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance 
to  ascertain  whether  researchers  currently  make  a  systematic  effort  to 
separate  method  variance  from  concept  variance.  The  persistence  of 
one-concept  one-method  operatlonlsm  was  evident:  only  18  articles  were 
found  to  employ  more  than  one  concept  or  more  than  one  method;  and  of 
these,  only  four  used  more  than  one  concept  and  more  than  one  method. 
None,  however,  systematically  separated  method  variance  from  concept 
variance;  only  one  of  the  authors  indicated  cognizance  of  this 
methodological  requirement.  The  multitrait  multimethod  approach  was  never 
mentioned. 

In  parallel  fashion,  Brunswik's  (1943,  1952,  1956)  argument  that 

generalization  over  conditions  requires  the  representation  of  ecological 
conditions  in  the  design  of  experiments  must  be  considered  a  milestone  in 
the  methodological  literature  of  psychology;  his  work,  too,  has  been  cited 
over  1000  times,  yet  representative  designs  are  seldom  employed  (see 
Hammond  &  Wascoe,  1980,  for  some  examples).  Representative  design  was 
never  mentioned  in  the  62  articles  examined  in  the  volume  cited  above.  The 
same  reasons  that  led  students  of  individual  differences  to  forgo  the  use 
of  the  multi  trait  multimethod  matrix  also  lead  experimental  psychologists 
to  forgo  the  use  of  representative  design;  both  are  more  difficult  and 
time-consuming  to  execute  than  standard  laboratory  experiments. 
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Plan  of  the  Article 

In  what  follows  we  fi rst  present  a  description  of  the  Campbell /Fiske 
internal  validity  matrix;  second,  indicate  our  extension  of  it  to  an 
external  validity  matrix  that  incorporat  =  the  theory  of  representative 
design  of  experiments;  third,  show  the  complementarity  of  the  two 
matrices;  and  fourth,  illustrate  how  both  matrices  can  be  used  to  achieve 
generalization  over  conditions. 

The  Campbel 1-Fi ske  Internal  Validity  Matrix 

The  internal  validity  multi  trait  multimethod  matrix,  presented  in 
Table  1,  is  developed  from  a  set  of  test  scores  taken  from  a  group  of 
subjects  (Campbell  &  Fiske,  1959).  The  scores  for  each  subject  are 
correlated  over  several  traits  and  methods.  The  authors  describe  the 
matrix  as  follows: 

This  illustration  involves  three  different  traits,  each  measured 
by  three  methods,  generating  nine  separate  variables.  It  will  be 
convenient  to  have  labels  for  various  regions  of  the  matrix,  and 
such  have  been  provided  in  Table  [1].  The  reliabilities  will  be 
spoken  of  in  term  of  three  reliability  diagonals,  one  for  each 
method.  The  reliabilities  could  also  be  designated  as  the 
monotrait-monomethod  values.  Adjacent  to  each  reliability 
diagonal  is  the  heterotrai t-monomethod  triangle.  The  reliability 
diagonal  and  the  adjacent  heterotrai t-monomethod  trianyle  make  up 
a  monomethod  block.  A  heteromethod  bl ock  is  made  up  of  a 
validity  diagonal  (which  could  also  be  designated  as 
monotrait-heteromethod  values)  and  the  two 
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heterotrai t-heteromethod  triangles  lying  on  each  side  of  it. 
Note  that  these  two  heterotrai t-heteromethod  triangles  are  not 
Identical . 

In  terms  of  this  diagram,  four  aspects  bear  upon  the 
question  of  validity.  In  the  first  place,  the  entries  in  the 
validity  diagonal  should  be  significantly  different  from  zero  and 
sufficiently  large  to  encourage  further  examination  of  validity. 
This  requirement  is  evidence  of  convergent  validity.  Second,  a 
validity  diagonal  value  should  be  higher  than  the  values  lying  in 
its  column  and  row  in  the  heterotrait-heteromethod  triangles. 
That  is,  a  validity  value  for  a  variable  should  be  higher  than 
the  correlations  obtained  between  that  variable  and  any  other 
variables  having  neither  trait  nor  method  in  common.  This 
requirement  may  seem  so  minimal  and  so  obvious  as  to  not  need 
stating,  yet  an  inspection  of  the  literature  shows  that  it  is 
frequently  not  met,  and  may  not  be  met  even  when  the  validity 
coefficients  are  of  substantial  size.  In  Table  [1],  all  the 
validity  values  meet  this  requirement.  A  third  common-sense 
desideratum  is  that  a  variable  correlate  higher  with  an 
independent  effort  to  measure  the  same  trait  than  the  measures 
designed  to  get  at  different  traits  which  happen  to  employ  the 
same  method.  For  a  given  variable,  this  involves  comparing  its 
values  in  the  validity  diagonals  with  its  values  in  the 

heterotrai t-monomethod  triangles.  For  variables  Al,  Bl,  and  Cl, 
this  requirement  is  met  to  some  degree.  A  fourth  desideratum  is 
that  the  same  pattern  of  trait  interrelationship  be  shown  in  all 
of  the  heterotrait  triangles  of  both  the  monomethod  and 
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heteromethod  blocks.  The  hypothetical  data  in  Table  [1]  meet 
this  requirement  to  a  very  marked  degree,  in  spite  of  the 
different  general  levels  of  correlation  involved  in  the  several 
heterotrait  triangles.  The  last  three  criteria  provide  evidence 
for  discriminant  validity.  (1959,  pp.  82-83). 

The  value  of  this  methodology  is  indisputable,  and  its  application 
will  yield  definite  and  useful  conclusions  regarding  the  validity  of 
psychological  traits  or  theoretical  concepts  in  general  (see,  e.g..  Brewer 
&  Collins,  1981;  Fiske,  1981).  The  results  from  such  a  matrix  will  have 
populational  and  task  generality  insofar  as  the  trait  domain,  the 
apparatus /method  domain  and  the  subject  domain  have  been  adequately 
sampled.  The  results,  therefore,  speak  to  the  question  of  the  construct 
validity  of  the  traits  investigated  separate  from  the  methods  used,  within 
the  restraints  chosen  by  the  investigator. 

Insert  Table  1  about  here 


Extension  of  the  Campbell /Fiske  Approach 

Campbell  and  Fiske  (1959)  developed  the  multi  trait  multimethod  matrix 
in  order  to  evaluate  the  (a)  internal  validity  of  certain  (b)  traits  within 
the  study  of  (c)  individual  differences  based  on  (d)  group  data.  We  extend 
their  method  by  (a)  adding  an  external  validity  matrix;  (b)  using  both  the 
internal  and  the  external  validity  matrices  to  evaluate  concepts  in  general 
instead  of  traits;  ( c )  using  both  matrices  to  test  propositions,  in  the 
tradition  of  experimental  psychology;  (d)  making  the  behavior  of  the 
individual  rather  than  of  the  group  the  fundamental  unit  of  analysis, 
although  group  data  can  be  analyzed  as  well.  (See  Hammond,  McClelland,  & 
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Mumpower,  1980,  pp.  115-127  on  the  advantages  of  single-subject  analysis; 
also  Meehl ,  1978,  on  the  deficiencies  of  conventional  between-group  and 
wi thi n-group  anal yses . ) 

The  External  Validity  Matrix 

Table  2  presents  an  external  validity  matrix  that  is  based  upon 
correlations  between  nine  sets  of  engineers'  judgments,  made  under  three 
methods  (cognitive  modes)  for  each  of  three  concepts,  and  three  criteria. 
The  three  validity  diagonals  contain  monoconcept  correlations  between  each 
set  of  judgments  (one  for  each  method)  and  the  criterion  of  the  same 
concept  against  which  the  judgments  are  compared.  The  triangles  consist  of 
heteroconcept  correlations  between  the  judgments  made  in  each  condition 
(concept-method  unit)  and  the  criterion  for  a  different  concept.  A  method 
block  consists  of  a  validity  diagonal  and  the  heteroconcept  triangles  on 
either  side  of  it. 

The  coefficients  in  the  external  validity  matrix  in  Table  2  are 
different  from  those  in  the  internal  validity  matrix  in  that  each 
correlation  in  the  external  validity  matrix  is  between  judgments  and 
measures  of  a  criterion  rather  than  between  two  responses.  Aside  from  this 
very  important  difference,  the  interpretation  of  the  coefficients  with 
respect  to  the  questions  of  convergent  and  discriminant  validity  is  quite 
similar.  As  in  the  internal  validity  matrix,  correlations  in  the  external 
validity  diagonal  that  are  sufficiently  large  are  evidence  of  convergent 
validity.  In  Table  2  the  coefficients  in  the  diagonals  within  each  method 
block  would  show  the  external  convergent  validity  of  the  judgment  of  each 
concept  by  that  method.  Comparison  of  the  average  of  these  diagonal  values 
across  the  three  concepts  would  indicate  the  relative  external  convergent 


Achieving  Generality  over  Conditions 
Hammond,  Hamm,  and  Grassia 


Page  9 
02  Aug  84 


validity  of  each  method.  The  heteroconcept  triangles  consist  of  the 
correlations  of  the  expert's  judgments  of  one  concept  (by  a  particular 
method)  with  the  criterion  measure  of  a  different  concept.  Evidence  of 
discriminant  validity  exists  when  a  value  in  a  validity  diagonal  is  higher 
than  the  values  lying  in  its  column  and  row  in  the  heteroconcept  triangles. 
Further  tests  of  external  discriminant  validity  are  described  below. 

Insert  Table  2  about  here 


The  External  Val idity  Matrix  and  the  Representative  Design  of  Experiments 

The  argument  for  the  representative  design  of  experiments  is 
explicated  in  the  external  validity  matrix  because  the  naturally  occurring 
intercorrelations  among  criterion  variables  are  represented  in  the  matrix 
(see  Table  2).  For  example,  if  the  correlation  between  criteria  for  cl  and 
c2  in  Table  2  were  .5,  we  would  expect  all  correlations  between  judgments 
of  cl  and  the  criterion  for  c2  (and  vice  versa)  to  be  as  high  as  but  no 
higher  than  .5  if.  an  engineer  is  performing  appropriately.  The 
intercorrelations  among  the  criteria,  or  intraecological  correlations,  thus 
provide  a  standard  for  the  heteroconcept  correlations  in  the  external 
validity  matrix,  and  in  the  internal  validity  matrix  as  well.  Without 
ecological  representativeness  as  a  standard,  all  such  intercorrelations  are 
changed  by  the  experimenter  to  zero  in  the  conventional  systematic  design 
of  experiments.  Therefore,  generalization  cannot  be  achieved  on  logical 
grounds,  and  indeed  is  not  achieved  empirically,  as  the  psychologists  cited 
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Complementarity  of  the  Internal  and  External  Val idlty  Matrices:  Evaluating 
Coherence,  Performance,  and  Competence 

The  usefulness  of  analyzing  the  external  validity  matrix  in 
conjunction  with  Campbell  and  Fiske's  internal  validity  matrix  is  that  the 
information  provided  by  these  matrices  is  complementary  and  makes  possible 
an  evaluation  of  cognitive  coherence,  performance,  and  competence.  The 
distinction  between  coherence  and  performance  is  intended  to  parallel  the 
traditional  distinction  between  the  coherence  and  correspondence  theories 
of  truth  (see,  e.g..  White,  1967  and  Prior,  1967).  The  coherence  theory 
focuses  on  the  extent  to  which  statements  of  facts  or  judgments  put  forward 
cohere  (or  "hang  together")  with  one  another,  that  is,  are  related  by 
logical  implication.  The  internal  validity  matrix  parallels  the  coherence 
theory  of  truth  in  the  sense  that  it  demands  logical  rather  than  external, 
empirical  justification.  Although  the  internal  matrix  does  include 
empirical,  factual  material,  no  reference  to  empirical  criteria  outside  the 
matrix  itself  is  required  to  establish  the  internal  validity  of  a  set  of 
psychological  concepts.  All  that  is  required  is  that  a  logical  criterion 
be  met,  namely,  that  convergent  validities  should  be  high  and  discriminant 
validities  should  be  low. 

The  correspondence  theory  of  truth,  on  the  other  hand,  is  concerned 
with  the  extent  to  which  our  beliefs  about  the  world  perform,  or 
correspond,  with  respect  to  independently  determined  facts.  Therefore  an 
independent  measure  of  the  concepts  in  question  is  required  in  order  to 
test  the  correspondence  between  what  a  theory  predicts  and  what  exists. 
The  external  validity  matrix  thus  parallels  the  correspondence  theory  of 
truth  in  that  it  demands  the  evaluation  of  the  empirical  correspondence 
between  psychological  concepts  and  some  independent  measure  of  them. 
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Finally,  because  both  matrices  can  be  developed  for  a  single  subject 
(as  we  demonstrate  below),  it  is  possible  to  combine  the  results  from  each 
matrix  into  a  single  measure  to  provide  a  higher  order  indicator  of  each 
expert's  judgment  that  we  shall  call  "co">Detence"  (see  also  McClelland, 
1973).  Since  we  derive  the  measure  of  competence  from  measures  of 
coherence  and  performance  that  are  based  on  variations  in  both  method  and 
concept,  our  derivation  copes  directly  with  the  problem  of  generalization. 
In  the  present  case,  for  example,  the  conclusions  about  an  expert's 
coherence  and  performance,  and  thus  competence,  are  clearly  based  on,  and 
thus  limited  to,  his/her  behavior  over  the  three  methods  and  three  concepts 
employed  in  the  study. 

Summary  of  Similariti es  and  Differences  between  Campbel 1  and  Fiske  U959 ) 
and  the  Present  Approach 

The  two  efforts  are  similar  in  that  each  provides  comparisons  of 
convergent  validities  and  discriminant  validities  across  concepts  and 
methods  (see  Tables  1  and  2);  but  there  are  several  differences.  First, 
the  internal  validity  matrix  does  not  include  test-criterion  relations,  but 
the  external  validity  matrix  does.  Therefore  it  contains  correlation 
coefficients  that  indicate  the  relation  between  measures  of  each  subject's 
behavior  and  external,  empirical  criteria.  As  a  result,  the  meaning  of  the 
entries  in  the  cells  is  different  in  the  two  matrices.  The  correlation 
coefficient  in  each  cell  in  the  Campbell /Fiske  internal  validity  matrix 
indicates  the  correlation  between  pairs  of  test  measures,  whereas  the 
correlation  coefficients  in  the  external  validity  matrix  indicate  the 
correlation  between  a  behavioral  measure  and  an  external  criterion. 
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Second ,  the  role  of  the  Individual  subject  in  the  two  kinds  of 
analysis  Is  very  different.  Each  correlation  coefficient  in  the  Campbell 
and  Fiske  (1959)  multi  trait  multimethod  matrix  is  across  individuals,  while 
in  a  multiconcept  multimethod  analysis  each  is  across  the  objects  of 
judgment,  within  a  single  individual.  More  specifically,  in  a  multitrait 
multimethod  analysis,  each  of  n  individuals  is  measured  on  j  (traits)  times 
k  (methods)  occasions,  and  one  multitrait  multimethod  matrix  is  made  for 
the  whole  set  of  individuals.  In  a  multiconcept  multimethod  analysis,  each 
of  n  individuals  judges  each  of  p  objects  on  j  (concepts)  times  k  (methods) 
occasions,  and  a  separate  multiconcept  multimethod  matrix  is  constructed 
for  each  of  the  n  individuals. 

Third,  because  the  external  validity  matrix  must  contain  at  least  two 
criterion  variables  in  order  to  separate  concept  from  method,  the  relations 
between  criteria  in  circumstances  toward  which  the  generalization  is 
intended  must  be  measured  and  taken  into  consideration  when  the  subject's 
performance  is  evaluated.  Conventional  experimental  psychology  has  been 
able  to  sidestep  this  matter  only  because  of  its  persistent,  implicit 

acceptance  of  single-concept  single-method  operationism.  It  is  precisely 
at  this  point,  however,  that  the  external  validity  matrix  is  directly 

linked  to  Brunswik's  (1956)  representative  design  of  experiments.  In 

representative  designs,  intra-ecologlcal  correlations  between  criteria 
cannot  be  ignored  and  arbitrarily  set  to  zero  as  is  customary.  This 

convention  Introduces  a  design  feature  that  must,  and  has,  frustrated 
generalization  of  results  because  the  results  are  obtained  under  conditions 
seldom  if  ever  present  in  the  conditions  of  application.  The  use  of 
representative  design,  however,  means  that  correlations  among  criterion 
variables  in  the  experiment  will  represent  those  in  the  circumstances  to 
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which  the  results  of  the  experiment  are  intended  to  generalize,  or  apply. 
In  short,  the  same  logic  of  inductive  inference  that  we  apply  when 
generalizing  from  subject  sample  to  subject  population  will  apply  to 
generalizing  from  experimental  conditions  to  any  other  set  of  conditions 
(see,  for  example,  Brunswik,  1943,  1952,  1956;  Hammond,  1966;  Hammond  & 
Wascoe,  1980;  Einhorn  &  Hogarth,  1981;  Epstein,  1979,  1980). 

Fourth  is  a  difference  in  aims.  Campbell  and  Fiske's  principal  aim 

was  to  enhance  our  methodological  ability  to  evaluate  the  construct 

val idity  of  traits  (Cronbach  &  Meehl ,  1955).  We  take  that  aim  to  have  been 
achieved  in  principle  (no  one  has  challenged  it),  if  not  in  practice.  We 
aim  therefore  to  build  upon  that  achievement  by  showing  that  both  matrices 
can  be  applied  to  experimental  psychology  as  well  as  to  the  study  of 

individual  differences  (cf.  Cronbach,  1975).  In  addition,  we  intend  to 
show  that  the  internal  validity  matrix  and  the  external  validity  matrix 
provide  comp! ementary  information:  (a)  the  internal  validity  matrix  method 
can  be  used  to  evaluate  the  coherence  of  an  expert's  judgments,  (b)  the 
external  validity  matrix  can  be  used  to  evaluate  the  performance  of  an 
expert's  judgments,  and  (c)  measures  of  coherence  and  performance  can  be 
combined  to  provide  a  measure  of  competence. 

Illustrative  Application 

The  Use  o£  the  Internal  Validity  Matrix  in  a  Study  of  Expert  Judgment 

Data  for  an  internal  validity  matrix  based  on  a  study  of  20  highway 
engineers'  judgments  of  the  concepts  of  aesthetics,  safety  and  capacity 
using  intuitive,  quasi -rational ,  and  analytical  methods  (see  Appendix  A) 
are  presented  in  Table  3.  The  data  for  the  matrix  were  generated  from  the 
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mean  of  the  20  engineers'  judgments  for  each  of  the  40  highways  presented 
to  them  for  each  concept-method  pair.  Thus,  the  matrix  illustrates  the 
particulars  of  the  behavior  of  an  artificial  engineer  constructed  from  the 
mean  judgments  of  this  group.  Data  from  the  artificial  engineer  are 
presented  mainly  to  illustrate  the  use  of  the  method;  no  inferences  can  be 
drawn  from  the  matrix  in  Table  3  to  a  matrix  generated  by  any  one  engineer. 
Illustrations  of  individual  matrices  are  provided  below. 

Insert  Table  3  about  here 


Each  of  the  descriptions  of  the  matrix  presented  by  Campbell  and  Fiske 
(1959)  apply  to  the  matrix  in  Table  3.  The  three  validity  diagonals 
contain  values  that  are  high,  relative  to  the  heteroconcept  triangles 
adjacent  to  them,  thus  providing  evidence  for  internal  convergent  and 
discriminant  validity. 

Use  of  the  External  Validity  Matri x  in  a  Study  of  Expert  Judgment 

Table  4  presents  the  artificial  engineer's  external  validity  matrix, 
also  based  on  the  mean  of  20  engineers'  judgments. 

Insert  Table  4  about  here 


Convergent  validity  of  concepts.  The  external  validity  coefficient 
for  the  artificial  engineer's  aesthetics  judgments  made  by  the  film  strip 
method  is  .855,  by  the  bar  graph  method  is  .945,  and  by  the  formula  method 
is  .951,  thus  producing  a  mean  external  convergent  validity  value  across 
all  three  methods  of  .926  for  aesthetic  judgments.  (Note:  Fisher's 
z-transformation  is  used  in  the  calculation  of  mean  values.)  Averaging 
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validity  correlations  pertaining  to  safety  from  the  three  method  boxes,  the 
mean  convergent  validity  is  .568;  similarly,  averaging  the 
judgment-criterion  correlations  for  capacity  produces  a  mean  convergent 
validity  value  of  .530.  In  short,  the  data  suggest  that,  irrespective  of 
the  method  used,  the  artificial  engineer  judged  highway  aesthetics  more 
accurately  than  highway  safety  or  capacity,  and  judged  safety  and  capacity 
with  equal  accuracy. 

Convergent  val idity  of  methods.  A  measure  of  the  external  convergent 
validity  for  each  method  may  be  calculated  by  averaging  the 
judgment-criterion  correlations  within  each  of  the  diagonals  (.86,  .70, 

.29;  .95,  .68,  .83;  .95,  .23,  .27),  thus  obtaining  external  validities 

for  each  method  (.67,  .85,  .65).  These  results  suggest  that  the  artificial 
engineer  judged  these  three  concepts  most  accurately  in  the  quasi -rational 
mode.  Finally,  the  mean  of  the  latter  three  coefficients  is  .74.  This 
measure  is  informative  because  it  may  be  used  to  compare  one  group  of 
experts  with  another,  to  compare  one  individual  with  another  (in  the  case 
when  a  matrix  is  constructed  for  each  individual),  or  to  evaluate  the 
effect  of  a  change  in  condition  In  either  case.  Moreover,  the  referential 
domain  of  this  measure  is  clear;  it  is  general  over  the  three  methods  and 
three  concepts  employed  in  the  study,  as  well  as  the  group  of  engineers 
selected. 

Measuri ng  discriminant  validity  with  reference  to  intra-ecological 
correlations.  The  intra-ecological  correlations  among  empirical  measures 
of  the  concepts  permit  an  additional  method  for  assessing  discriminant 
validity.  The  correlation  between  the  criterion  measures  of  the  concepts 
provides  a  standard  against  which  to  compare  the  heteroconcept  correlations 


* 


Achieving  Generality  over  Conditions 
Hanmond,  Hamm,  and  Grassia 


Page  16 
02  Aug  84 


between  the  expert's  judgment  of  one  concept  and  the  criterion  measure  of  a 
different  concept.  For  example,  if  the  correlation  between  aesthetics  and 
safety  is  -.275,  then  it  is  appropriate  for  an  engineer's  judgments  of 
aesthetics  to  be  correlated  -.275  with  safety  (see  Appendix  B).  Similarly, 
if  the  correlation  between  two  criterion  measures  is  low  (as  for  safety  and 
capacity,  .180),  then  the  heteroconcept  correlations  should  also  be  low. 
In  short,  the  observed  correlations  between  judgments  of  aesthetics,  safety 
and  capacity  for  an  engineer  are  not  to  be  compared  to  a  standard  of  zero 
(an  arbitrary  demand  for  complete  Independence  regardless  of  task 
conditions)  but  to  a  standard  that  is  representative  of  task  conditions,  if 
we  are  properly  to  evaluate  the  discriminant  validity  of  the  judgments  of 
these  concepts  with  these  methods. 

To  "untie"  these  variables,  in  other  words  to  force  zero 
intercorrelations  among  them.  Is  (a)  to  Invite  the  engineer  to  judge  an 
unrepresentative  set  of  conditions  and  thus  (b)  to  extrapolate  his  results 
illegitimately  from  irrelevant  conditions  to  the  relevant  ones.  These  two 
tactics  have  an  embarrassingly  long  history  in  psychology;  they  are 
customarily  explained  away  by  arguments  that  "this  is  the  best  we  can  do" 
and/or  "it  doesn't  matter,  anyway."  Neither  argument  is  correct,  but 
neither  is  necessary;  the  external  validity  form  of  the  multiconcept 
multimethod  matrix  makes  it  possible  to  evaluate  the  competence  of  experts 
(or  other  subjects)  in  relation  to  the  task  conditions  to  which  their 
judgments  are  to  be  applied. 

The  examples  presented  below  illustrate  the  detailed  application  of 
both  the  internal  and  external  validity  matrices  to  the  study  of  expert 
judgment.  The  first  section  describes  the  use  of  both  matrices  for  testing 
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propositions  In  the  context  of  experimental  psychology,  and  the  second 
describes  the  use  of  the  matrices  In  connection  with  the  study  of 
individual  differences. 

Appl i cation  to  Experimental  Psychology 
Internal  Val idity  Matrix 

The  analyses  to  be  reported  In  this  section  require  that  a  matrix, 
similar  to  that  for  the  artificial  engineer  of  Table  3,  above,  be  produced 
for  each  engineer,  and  that  convergent  or  discriminant  validities  be 
determined  for  each. 

It  is  possible  to  derive  one  criterion  of  convergent  validity  and  four 
criteria  of  discriminant  validity  from  the  internal  validity  matrix.  The 
criterion  for  convergent  validity  and  one  for  discriminant  validity  are 
described  below.  The  remaining  criteria  for  internal  discriminant  validity 
are  described  in  Appendix  C. 

Convergent  val idity.  The  convergent  validity  measure  (monoconcept 
heteromethod  correlations  between  judgments  of  the  same  concept  using 
different  methods)  can  be  used  to  test  hypotheses  concerning  the  empirical 
status  of  each  concept.  For  example, 

HI:  Each  theoretical  concept  has  empirical  meaning,  i.e.,  there 
is  convergent  validity  for  each  concept  across  methods  and 
within  an  appropriate  sample  of  subjects. 
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Hypothesis  1  can  be  tested  by  asking  whether,  for  each  subject, 
judgments  of  the  quantity  of  a  concept  covary,  independently  of  the  methods 
used  to  make  the  judgments.  For  example,  for  the  artificial  engineer 
(Table  3)  the  correlation  between  the  film  strip  and  bar  graph  methods  for 
the  aesthetics  concept  Is  .890;  for  the  film  strip  and  formula  methods, 
.864;  and  for  the  bar  graph  and  formula  methods,  .985.  The  overall 
convergent  validity  for  aesthetics  is  the  mean  of  these  correlations 
(z-transformed) ,  .938,  which  is  significant  at  p  <  .001.  A  matrix  was 
developed  for  each  of  the  20  engineers  Individually,  and  this  procedure  was 
carried  out  for  each  of  the  three  concepts.  All  20  engineers  had 
significant  positive  convergent  validities  for  aesthetics,  16  for  safety, 
and  17  for  capacity.  Hence  we  conclude  that  each  of  the  three  concepts  is 
capable  of  being  measure d  by  appropriate  subjects  independently  of  the 
method  used;  generality  has  been  achieved  over  three  methods. 

More  specific  hypotheses  may  also  be  addressed,  for  example, 

H2:  No  concept  has  higher  or  lower  convergent  validity  than  any 
other. 

To  test  Hypothesis  2,  the  computed  mean  of  the  z-transforms  of  the 
three  aesthetics  convergent  validities  (indicated  in  the  previous 
paragraph)  is  compared  to  the  means  of  the  safety  and  capacity  convergent 
validities,  for  each  engineer.  The  results  indicate  that  17  of  the  20 
engineers  had  greatest  convergent  validity  when  judging  the  aesthetics 
concept  (Chi -squared  =21.75,  p  <  .001).  A  t-test  analysis  shows  that, 
over  the  20  engineers,  the  convergent  validity  score  for  aesthetics  was 
significantly  higher  than  the  score  for  safety  (t  =  5.08,  p  <  .001)  and  for 
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capacity  (t  =  5.66,  p  <  .001).  Again,  the  generality  of  the  results  is  not 
contingent  upon  a  single  method;  the  domain  of  generality  over  concepts, 
methods  and  subjects  is  made  evident. 

Questions  regarding  the  relative  efficacy  of  methods  over  concepts  may 
also  be  addressed.  For  example, 

H3:  No  method  pair  has  higher  or  lower  convergent  validity  than 
any  other  method  pair. 

To  test  Hypothesis  3,  we  must  consider  the  convergent  validities 
related  to  each  pair  of  methods.  When  the  film  strip  and  bar  graph  methods 
are  applied  to  aesthetics,  the  convergent  validity  is  .890,  to  safety, 
.713,  and  to  capacity,  .591  for  the  artificial  engineer  of  Table  3.  The 
mean  (via  z-transforms)  of  these  correlations  is  .761.  The  mean  for  each 
of  the  possible  method  pairs  is  calculated  through  the  development  of  a 
matrix  for  each  engineer,  and  the  order  among  pairs  is  determined,  similar 
to  the  analysis  used  for  testing  Hypothesis  2.  For  17  of  the  20  engineers 
the  bar  graph  and  formula  were  the  method  pair  that  produced  the  highest 
convergent  validity  across  the  three  concepts  (Chi-squared  =  21.754, 
p  <  .001).  This  result  tells  us  which  pair  of  methods  across  the  three 
concepts  is  best  for  achieving  convergent  validity  with  regard  to  these 
three  concepts. 

Discriminant  validity.  Convergent  validity  informs  us  about  the 
covariance  of  judgments  across  methods,  and  thus  about  the  status  of  a 
concept  independent  of  the  method  used  to  measure  it.  In  addition, 
however,  we  need  to  know  whether  the  concept  is  di scriminable  from  other 
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proposed  theoretical  entitles.  The  first  internal  discriminant  validity 
analysis  employed  in  the  examples  below  compares  monoconcept  heteromethod 
correlations  to  heteroconcept  heteromethod  correlations.  Campbell  and 
Fiske  (1959)  gave  first  priority  to  this  test;  for  although  many  people 
would  think  it  “so  minimal  and  obvious  as  not  to  need  stating,"  (p.  82) 
they  observed  that  it  often  fails  to  be  true.  We  therefore  illustrate  the 
test  for  the  following  hypothesis: 


H4:  All  pairs  of  concepts  are  equally  discriminable. 


This  hypothesis  will  be  tested  by  calculating  an  index  for  each 
concept  pair  for  each  engineer,  and  looking  for  evidence  of  any  concept 
being  more,  or  less,  discriminable  than  the  others,  for  a  statistically 
significant  number  of  engineers.  To  illustrate  the  calculation  of  the 
index  for  the  aesthetic  and  safety  concepts,  for  the  artificial  engineer  of 
Table  3,  we  compare  the  correlations  from  the  validity  (monoconcept 
heteromethod)  diagonals  that  involve  either  aesthetics  (.890,  .864,  .985) 

or  safety  (.713,  .393,  .422)  with  the  correlations  from  the  heteroconcept 
heteromethod  triangles  that  involve  both  concepts  (.283,  .244,  .360,  .093, 

.548,  and  .209).  (The  sign  on  all  heteroconcept  correlations  involving 
aesthetics  was  reversed  because  the  intra-ecological  correlations  between 
the  criterion  measures  of  aesthetics  and  safety,  and  of  aesthetics  and 
capacity,  were  negative.)  In  order  to  aggregate  these  comparisons  into  an 
index,  we  subtract  the  mean  of  the  z-transformations  of  the  second  set  of 
correlations  (.306)  from  the  mean  of  the  z-transformations  of  the  first  set 
(1.156),  which  produces  an  index  (.850)  of  the  discriminability  of  the 
aesthetics  and  safety  concepts.  The  corresponding  index  for  aesthetics  and 
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capacity  is  .913;  for  safety  and  capacity,  -.047.  Thus,  for  the 
artificial  engineer  aesthetics  and  capacity  are  the  easiest  concepts  to 
discriminate,  and  safety  and  capacity  are  most  difficult  to  discriminate  (a 
result  which  carries  some  practical  implications). 

This  index  of  discriminant  validity  is  calculated  for  each  concept 
pair  from  each  subject's  matrix,  and  the  order  among  concept  pairs  is 
determined.  For  all  20  engineers,  the  safety  and  capacity  concepts  were 
least  discriminable  (Chi-squared  =  37.053,  p  <  .001).  Therefore  null 
hypothesis  4  is  rejected,  for  the  engineers'  judgments  of  safety  and 
capacity  are  more  similar  to  each  other  than  either  is  to  their  judgment  of 
aesthetics.  The  remaining  three  Indices  of  internal  discriminant  validity 
are  described  in  Appendix  C. 

External  Validity  Analysis 

One  measure  of  convergent  validity  and  three  measures  of  discriminant 
validity  can  be  derived  from  the  external  validity  matrix.  In  addition, 
two  measures  of  external  discriminant  validity  can  be  produced  using  data 
from  the  Internal  validity  matrix. 

Convergent  validity.  The  external  convergent  validity  measure  is 
based  on  the  correlation  between  the  engineer's  judgments  of  a  concept  and 
the  criterion  measure  of  that  concept.  We  examine  first  the  relative 
convergent  validity  of  each  concept,  thus: 

H5:  No  concept  has  higher  or  lower  external  convergent  validity 
than  any  other. 
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Hypothesis  5  Is  tested  by  averaging  the  z-transforms  of  the  correlations 
for  each  concept  across  methods,  and  then  comparing  the  averages  for  each 
concept.  Thus  the  aesthetics  concept  had  higher  convergent  validity  than 
safety  or  capacity  for  all  20  engineers  {Chi -squared  =  37.053,  p  <  .001). 
Despite  the  counterintuitive  nature  of  this  result,  it  has  a  claim  to  our 
attention;  it  is  general  across  three  methods  and  stands  against  two  other 
concepts . 

Similar  questions  of  external  convergent  validity  can  be  addressed  to 
methods.  For  example, 

H6:  No  method  has  higher  or  lower  external  convergent  validity 
than  any  other. 

Hypothesis  6  is  tested  by  averaging  the  z-transforms  of  the 
correlations  for  each  method  across  concepts,  and  comparing  methods.  The 
film  strip  method  was  found  to  have  the  lowest  convergent  validity  for  18 
of  the  20  engineers  {Chi-squared  =  26.404,  p  <  .001).  It  is  least 
dependable  in  the  context  of  this  study.  Methods  and  results  for 
Hypotheses  5  and  6  are  given  in  more  detail  in  Hammond,  Hamm,  Grassia,  and 
Pearson  (1984). 

Discriminant  validity.  The  external  validity  matrix  provides  three 
ways  of  measuring  external  discriminant  validity,  and  two  additional 
measures  can  be  produced  from  the  internal  validity  matrix  in  combination 
with  the  criterion  intercorrelations.  These  measures  can  be  used  to  ask 
whether  concepts  can  be  discriminated  accurately.  For  example. 
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H7:  All  pairs  of  concepts  are  equally  discriminate. 

The  first  external  discriminant  validity  measure  is  analogous  to  the 
first  internal  discriminant  validity  measure,  and  is  used  to  test 
Hypothesis  7  just  as  the  latter  was  used  to  test  Hypothesis  4:  by 
calculating  an  index  of  discriminability  for  each  concept  pair  for  each 
engineer,  comparing  the  concept  pairs,  and  determining  whether  any 
particular  order  among  the  concept  pairs  occurred  in  a  significant  number 
of  engineers.  Thus,  for  the  artificial  engineer  in  Table  4,  the 
discriminability  of  the  aesthetics  and  safety  concepts  is  measured  by 
subtracting  the  mean  of  the  z-transforms  of  the  heteroconcept  correlations 
involving  aesthetics  and  safety  (-.016,  .362,  .233,  .497,  .313,  and  .226) 
from  the  mean  of  the  z-transformations  of  the  achievement  correlations 
involving  aesthetics  or  safety  (aesthetics:  .855,  .945,  .951;  safety: 
.702,  .683,  .226),  a  difference  of  .855.  This  figure  is  calculated  for 
each  concept  pair  for  each  engineer;  the  safety  and  capacity  concepts  were 
least  discriminable  for  each  of  the  20  engineers  (Chi-squared  =  37.05, 
p  <  .001),  a  result  that  is  consistent  with  that  obtained  in  the  internal 
validity  matrices. 

The  availability  of  information  about  the  intercorrelation  among  the 
measured  criteria  makes  possible  four  additional  procedures  besides  the 
first  measure  of  external  discriminant  validity  described  above.  The 
second  and  third  procedures  Involve  direct  comparison  of  heteroconcept 
correlations  with  the  correlations  between  the  criterion  measures  of  the 
two  concepts,  for  the  external  and  internal  validity  matrices  respectively. 
The  fourth  and  fifth  procedures  involve  testing,  for  both  matrices,  whether 
the  pattern  of  correlations  in  each  heteroconcept  triangle  is  identical  to 
the  pattern  of  correlations  among  the  three  criterion  measures. 
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The  second  and  third  external  discriminant  validity  measures  allow  us 
to  ask  whether  there  is  systematic  over-  or  underdiscrimination  between 
concepts  by  testing  the  following  hypothesis: 

H8:  Concepts  are  discriminated  accurately. 

The  second  external  discriminant  validity  measure,  which  compares 
heteroconcept  correlations  from  the  external  validity  matrix  with  the 
corresponding  correlations  between  the  criterion  measures,  was  used  to  test 
Hypothesis  8.  A  parallel  test  could  be  carried  out  with  the  third  external 
discriminant  validity  index,  which  uses  heteroconcept  correlations  from  the 
internal  validity  matrix. 

From  each  heteroconcept  correlation  in  the  external  validity  matrix, 
the  corresponding  criterion  intercorrelation  (intra-ecological  correlation) 
is  subtracted  (after  z-transformation).  The  mean  of  these  differences  for 
the  set  of  heteroconcept  correlations  corresponding  to  a  pair  of  concepts 
indicates  the  extent  of  the  engineer's  under-  or  overdiscrimination  of  the 
concepts.  This  procedure  can  be  carried  out  for  the  safety  and  capacity 
concepts  for  the  artificial  engineer  (Table  4).  We  subtract  the 
z-transform  of  .180,  the  correlation  between  their  criterion  measures,  from 
the  mean  of  the  z-transforms  of  the  heteroconcept  correlations  involving 
aesthetics  and  safety  (.683,  .399,  .516,  .437,  .199,  .383),  producing  a 

difference  of  .302.  The  positive  sign  of  this  number  indicates  that, 
overall,  the  artificial  engineer  underdiscriminates  between  safety  and 
capacity  (confirming  two  prior  results).  This  procedure  was  carried  out 
for  each  concept  pair,  for  each  engineer.  Fourteen  of  the  20 
overdiscriminated  between  aesthetics  and  safety  (Chi -squared  =  2.45, 
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df  =  1,  NS),  15  over  discriminated  between  aesthetics  and  capacity 
(Chi-squared  =  4.05,  p  <  .05),  and  19  underdiscriminated  between  safety  and 
capacity  (Chi-squared  =  14.45,  p  <  .001). 

The  testing  of  a  hypothesis  using  the  fifth  operation  measuring 

external  discriminant  validity  is  described  in  Appendix  C. 

Summary 

In  this  section  we  have  illustrated  the  application  of  the 
multiconcept  multimethod  validity  analysis  to  topics  typically  of  concern 
to  experimental  psychologists:  testing  theoretical  propositions  regarding 
the  comparison  of  concepts  and  methods.  This  was  done  by  using  the 

internal  validity  matrix,  which  is  concerned  solely  with  the  relations 

among  different  judgments  of  the  concepts,  obtained  under  different 
methods;  and  with  the  external  validity  matrix,  concerned  with  the 
relation  between  the  judgnents  and  the  criterion  measures  of  the  concepts. 

Our  illustration  highlights  the  complementarity  of  these  two  analyses. 
We  found  in  both  the  internal  and  external  validity  analyses  that  the 

aesthetics  concept  has  the  highest  convergent  validity;  that  the  best  pair 
of  methods  to  use  to  obtain  discriminant  validity  (in  these  conditions)  is 
the  quasi -rational ,  bar  graph  method  and  the  analytical,  formula-producing 
method;  and  that  safety  and  capacity  are  least  discriminate  from  each 
other.  The  external  validity  analysis  was  able  to  put  this  last  finding  in 
sharper  perspective  than  could  the  internal  validity  analysis.  It  showed 
that  the  engineers  underdiscriminate  safety  and  capacity  in  comparison  with 
the  intercorrelation  between  the  criterion  measures  of  these  concepts.  The 
engineers'  judgments  of  these  two  concepts  are  most  highly  correlated, 
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while  in  fact  the  criterion  measures  of  these  concepts  have  the  lowest 
intercorrelation.  This  could  have  been  otherwise;  that  is,  if  safety  and 
capacity  had  actually  been  very  highly  correlated,  the  engineers  might  have 
o verdi sc rimi nated  them  even  if  the  internal  validity  analysis  had  indicated 
that  these  two  concepts  are  discriminated  less  than  any  other  pairs  of 
concepts.  The  external  validity  analysis  provides  the  only  way  to 
determine  which  of  these  possibilities  is  true. 

Application  to  Individual  Differences 

The  multiconcept  multimethod  approach  can  be  used  to  study  individual 
differences  in  the  competence  of  expert  judgment.  The  need  for  individual 
comparisons  is  apparent  from  Tables  5,  6,  7,  and  8,  which  show  the  internal 
and  external  validity  matrices  for  two  engineers.  Engineer  A's  validity 
correlations  are  high  (mean  of  9  monoconcept  heteromethod  correlations  from 
internal  validity  matrix  =  .705;  mean  of  9  monoconcept  correlations  from 
external  validity  matrix  =  .741),  while  Engineer  B's  are  low  (mean  from 
internal  matrix  =  .358;  mean  from  external  matrix  =  .609).  Similar 
differences  can  be  seen  in  their  discriminant  validities. 

Insert  Tables  5,  6,  7,  8  about  here 


Indices  of  Coherence,  Performance  and  Competence 

Individual  differences  among  engineers  can  be  studied  using  numerical 
measures  of  convergent  and  discriminant  validity  derived  from  both  the 
internal  and  external  validity  matrices.  A  procedure  for  evaluating 
validity  can  be  converted  into  a  numerical  measure  by  adding  (or 
subtracting,  as  appropriate)  the  means  of  the  z-transforms  of  the 
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correlations  in  all  the  relevant  cells  in  the  matrix.  The  formulas  for 
producing  these  indices  are  given  in  Table  9  and  explained  in  Appendix  D. 

These  measures  can  be  combined  into  indices  that  measure  overal 1 
|  internal  validity  (see  Figure  1),  which  indicates  the  coherence  of  the 

engineer's  judgments;  the  corresponding  index  of  external  validity 
indicates  the  engineer's  performance,  i.e.,  the  correspondence  between  his 
i  judgments  and  reality.  And  the  mean  of  these  two  indices  provides  a 

measure  of  the  engineer's  overall  competence.  Each  index  can  be  produced 
at  different  levels  of  aggregation  (e.g.,  for  each  concept  or  for  each  pair 
of  methods;  see  columns  of  Table  9),  thus  allowing  numerical  comparisons 
among  these  indices  at  each  level. 

Insert  Figure  1  and  Table  9  about  here 

Measurements  of  coherence  and  performance  are  of  special  theoretical 
importance.  The  coherence  of  a  person's  judgments  is  the  central 
characteristic  of  one  of  the  traditional  theories  of  knowledge,  the 
coherence  theory  of  truth.  And  the  performance  of  a  person's  judgments  is 
the  central  characteristic  of  a  second  traditional  theory  of  knowledge,  the 
correspondence  theory  of  truth.  Therefore,  taken  together,  indices  of 
Internal  and  external  validity  inform  us  about  a  person's  competence  in  the 
context  of  two  historic  theories  of  truth. 

The  methodology  described  here  makes  it  possible  to  measure  coherence 
and  performance  over  several  concepts  and  methods.  Thus  the  generality  of 
the  behavior  of  each  subject  is  explicated  in  terms  of  each  theory  of 
knowledge  in  the  context  of  a  different  matrix  of  concepts  and  methods, 
each  of  which  provides  its  own  methodological  justification  for  the 
generalization  of  results. 
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Among  the  experts  in  the  example  used  here,  a  fairly  high  relation 
(.60)  was  found  between  coherence  and  performance.  The  treatment  of 
coherence  and  performance  as  cognitive  traits  thus  will  allow  us  to  examine 
empirically  theoretical  questions  of  importance  to  both  philosophers  and 
psychologists.  For  example: 

1.  Should  competence  always  be  a  joint  product  of  coherence  and 

performance?  Should  these  traits  always  be  additive?  Or  is 

coherence  a  necessary  but  not  a  sufficient  condition  for 
performance?  Common  sense  suggests  that  this  should  be  so.  But 
the  relation  between  these  traits  may  depend  upon  the  complexity 
of  the  material  and  the  degree  of  intellectual  training  required 
to  master  it.  That  is,  variation  in  competence  in,  say,  atomic 
physics  may  produce  a  very  high  correlation  between  coherence  and 
performance,  whereas  variation  in  competence  in  financial 
forecasting  may  not.  In  short,  coherence  and  performance  may 
combine  in  different  ways  to  provide  competence,  depending  upon 
the  nature  of  the  material  to  be  dealt  with  and  the  degree  of 
training  of  the  subject. 

2.  How  should  the  measures  of  coherence  and  performance  be  combined 
into  an  overall  measure  of  competence?  Should  they  be  weighted 
according  to  their  relative  importance  and/or  the  quality  of  the 
measures?  Common  practice  is  to  consider  these  measures 
separately.  Moreover,  different  approaches  to  the  study  of 
cognition  give  greater  consideration  to  one  or  the  other  of  these 
aspects  of  competence.  Studies  within  the  framework  of  artificial 
intelligence  and  problem  solving,  for  example,  weight  the  experts' 


Achieving  Generality  over  Conditions 
Hammond,  Hamm,  and  Grassia 


Page  29 
02  Aug  84 


coherence  (and  the  coherence  of  the  computer  program  that 
simulates  the  expert)  very  highly  while  placing  less  weight  on 
performance.  Judgment  and  decision  researchers  do  the  opposite 
(see  Hammond,  1983).  Explicating  the  concept  of  competence  in 
terms  of  coherence  and  performance  thus  suggests  that  these  two 
currently  independent  fields  of  research  are  investigating 
complementary  aspects  of  competence  among  experts. 

Comparison  of  the  Competence  of  the  Individual  Experts  and  the  Artificial 
Expert 

Tables  3  and  4  (above)  show  the  data  for  the  artificially  constructed 
engineer,  produced  by  taking  the  mean  of  all  the  engineers'  judgments  of 
each  highway,  within  each  of  the  nine  cells,  and  then  performing  a 
multiconcept  multimethod  analysis  on  these  data.  Would  such  an  artificial 
expert,  built  upon  aggregated  judgments,  provide  more  competent  judgnents 
than  the  individual  experts? 

Table  10  contrasts  the  validity  indices  and  subindices  for  the 
artificial  engineer  with  the  corresponding  indices  for  the  lowest,  mean, 
and  best  of  the  individual  engineers.  For  all  indices  the  artificial 
engineer’s  validity  indices  were  better  than  the  mean  of  the  individual 
engineers'  indices.  Most  important,  for  two  indices  (internal  and  external 
convergent  validity)  the  artificial  engineer's  index  was  better  than  that 
of  the  best  engineer.  Finally,  combining  engineers'  individual  judgments 
produced  judgments  that  were  more  competent  than  all  but  one  engineer's 


judgments. 
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Insert  Table  10  about  here 


Summary 

Individual  differences  were  found  in  the  quality  of  experts'  judgment. 
Numerical  measures  were  created  for  a  number  of  procedures  for  measuring 
internal  and  external  convergent  and  discriminant  validity.  These  were 
combined  into  indices  for  the  internal  validity  matrix  (pertaining  to  the 
coherence  of  experts'  judgment)  and  for  the  external  validity  matrix 
(pertaining  to  their  performance).  A  correlation  of  .60  between  coherence 
and  performance  was  found  among  the  engineers  used  in  the  illustrative 
example.  The  coherence  and  performance  of  the  artificial  engineer,  created 
by  averaging  all  individual  engineers'  judgments  of  each  condition  of  the 
study,  proved  superior  to  that  of  the  individual  engineers. 

Discussion 

As  several  noted  psychologists  have  observed,  psychological  research 
lacks  the  cumulative  character  critical  to  the  development  of  a  science. 
In  any  such  circumstance  suspicion  would  arise  that  the  scientific 
discipline  in  question  is  the  captive  of  a  flawed  theoretical  or 
methodological  dogma.  Since  theories  are  numerous  in  psychology,  but 
methodology  is  uniform  throughout  graduate  schools  and  journal  reviews, 
dogmatic  methodology  must  be  the  prime  suspect. 

In  an  attempt  to  address  the  methodological  problem  of  generalization 
we  have  extended  and  Integrated  the  pioneering  efforts  of  Campbell  and 
Fiske  (1959)  and  Brunswik  (1956).  Using  Individual  experts'  judgments  of 
the  safety,  capacity  and  aesthetics  of  highways  made  under  three 
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conditions,  we  first  created  a  multiconcept  multimethod  matrix  of  internal 
val idity  for  the  judgment  of  concepts  about  highways,  using  different 
methods  of  eliciting  judgments.  This  contrasts  with  Campbell  and  Fiske's 
multi  trait  multi method  matrix  for  the  measurement  of  traits  of  persons, 
using  different  trait-measuring  methods.  Second,  we  used  criterion 
measures  for  the  concepts  to  create  an  external  val i di ty  matrix.  Measures 
of  convergent  and  discriminant  validity  can  be  calculated  from  each  of 
these  matrices  and  used  to  address  questions  concerning,  for  example,  how 
easily  concepts  can  be  discriminated  or  how  well  each  method  works.  Taking 
full  cognizance  of  the  empirical  relations  among  criteria  in  the 
determination  of  external  discriminant  validity  conforms  to  Brunswik's 
demand  for  the  representative  design  of  experiments.  Because  the 

intercorrelations  among  the  concepts  are  taken  into  account,  the  domain  of 
the  generality  of  the  results  is  explicit. 

The  logic  of  the  multiconcept  multimethod  matrix  is  based  on  what 
Feigl  called  "tri angulation  in  logical  space"  (Feigl,  1958;  see  also 

Campbell  &  Fiske,  1959,  p.  84).  From  a  logical  point  of  view,  the  methods 
and  concepts  selected  for  study  should  be  completely  independent;  the 
"triangulation"  should  approximate  a  right  triangle  as  nearly  as  possible. 
Thus,  Campbell  and  Fiske  (1959)  discuss  "convergence  of  the  independent 
methods"  and  cite  Cronbach  and  Meehl's  argument  that  the  use  of  "diverse 
criteria  give[s]  greater  weight  to  the  claim  of  construct  validity  than 
do.  .  .predictions  of  very  similar  behavior"  (Cronbach  &  Meehl ,  1955,  p. 

295).  Brunswlk,  however,  emphasized  the  fact  that  the  ecological  variables 
that  so  often  serve  as  criteria  for  psychologists’  concepts  are  not 

Independent,  i.e.,  orthogonal  to  one  another.  Therefore,  from  the 

researcher's  point  of  view,  Fiegl 's  concept  of  "triangulation  in  logical 
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space"  Is  not  to  be  seen  as  a  goal,  but  as  a  condition  that  serves  didactic 
purposes,  without  regard  to  the  demands  of  specific  problems.  The  proper 
goal  for  the  researcher  (In  contrast  to  the  logician)  is  "triangulation  in 
empirical  space,"  In  which  the  logician's  worship  of  orthogonality  is 
replaced  by  the  researcher's  worship  of  generalization.  Informative  as  the 
logician's  remarks  undoubtedly  are,  the  proper  goal  of  basic  research  is 
generalization  of  results;  and  that  goal  can  best  be  achieved  through  the 
use  of  "representative  tri angulation,"  in  experiments  as  well  as  In  studies 
of  individual  differences. 


Addendum 

Curiously,  the  literature  of  modern  physics  does  not  seem  to  include 
many  treatises  on  methodological  issues  relating  to  reliability  and 
validity  of  experiments,  although  there  is  a  long  history  of  treatises  on 
measurement  in  physics  (also  aprarent  in  psychology).  A  recent  paper 
(Franklin  &  Howson,  1984)  entitled  "Why  do  scientists  prefer  to  vary  their 
experiments?"  treats  this  topic  as  a  contemporary  one,  thus  suggesting 
that  it  does  not  have  a  long  history  (the  oldest  topical  reference  is 
1979).  Also,  there  appears  to  be  no  systematic  treatment  in  physics  of  the 
problem  of  separation  of  method  from  concept  such  as  carried  out  by 
Campbell  and  Fiske  (1959).  Personal  communication  with  Allan  Franklin 
confirms  this  conclusion.  If  psychology  and  physics  are  indeed  beginning 
to  recognize  a  common  methodological  problem  of  considerable  importance, 
much  might  be  gained  from  a  joint  consideration  of  "why  scientists  prefer 
to  vary  their  experiments"  (although  it  is  not  at  all  clear  that  alj 
scientists  do). 
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A  comparison  of  the  manner  in  which  various  experimental  (physics, 
chemistry,  biology)  and  nonexperimental  (astronomy,  archeology)  disciplines 
treat  the  matter  of  repetition  of  experiments,  the  separation  of 
reliability  and  validity,  and/or  the  separation  of  concept  from  method  is 
beyond  the  scope  of  the  present  article.  Nevertheless,  it  is  worth 
mentioning  that  our  impression  Is  that  Campbell  and  Fiske's  (1959) 
contribution,  based  on  Felgl's  (1958)  original  work,  provides  a  more 
sophisticated,  detailed  examination  of  this  matter  than  exists  elsewhere 
(cf.  Hacking,  1983). 
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Footnotes 

In  constructing  the  internal  validity  matrix,  repeated  judgment 
reliabilities  were  not  available  from  the  data.  Therefore  they  were 
estimated,  using  R  from  the  linear  best  fit  model  of  the  engineer's 
judgments  for  the  film  strip  and  bar  graph  methods,  and  using  the 
correlation  '^tween  the  judgments  produced  by  corrected  and  uncorrected 
formulas  for  the  formula  method.  Further  details  of  these  measures  are 
available  in  Hammond,  Hamm,  Grassia,  and  Pearson  (1984). 

To  determine  whether  the  z-transformatlon  of  a  correlation  is 
significantly  different  from  a  zeta  of  zero  (the  expected  correlation  under 
the  null  hypothesis),  the  z-transformation  is  converted  to  a  z-score  by  the 
formula  (z-score  minus  zeta)  divided  by  the  variance  of  zeta  (square  root 
of  [1/(N  -  3)]),  and  the  probability  of  the  z-score  is  determined  from 
tables  for  the  normal  distribution.  See  Hays  (1973,  p.  662). 
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Table  9  (continued) 
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Figure 

overall 

Figure 


Figure  Captions 

.  The  structure  of  indices  representing  coherence,  performance  and 
competence. 

i-l .  Design  of  the  highway  engineers  study. 
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APPENDIX  A 

Context  of  Application 

Whereas  Campbell  and  Fiske  (1959)  directed  their  efforts  toward 

ascertaining  the  validity  of  measures  of  constructs  ("traits")  about 
people,  we  attempted  in  a  study  of  experts  to  ascertain  the  validity  of 
expert  judgments  of  concepts  about  highways.  The  purpose  of  this  study  was 
to  examine  the  relative  efficacy  of  intuitive,  quasi-rational  and 

analytical  cognition.  Twenty  engineers  judged  the  aesthetic  value,  safety, 
and  capacity  of  40  highways  under  three  modes  of  cognition.  Each 
engineer's  judgments  were  studied  in  each  cell  of  the  diagram  presented  in 

Figure  A-l.  Intuition  was  induced  by  requiring  each  expert  to  judge  each 

concept  (aesthetics,  safety,  capacity)  from  film  strips  of  1-3  mile 
segments  of  each  of  the  40  highways.  Quasi  rational ity  was  induced  by 
requiring  each  expert  to  judge  each  concept  from  bar  graphs  that  presented 
the  values  of  nine  attributes  for  each  highway.  Analytical  cognition  was 
induced  by  requiring  each  engineer  to  construct  a  mathematical  formula  for 
each  concept.  An  empirical  criterion  was  available  for  each  concept.  The 
criterion  for  the  aesthetic  value  of  each  highway  was  derived  from  the  mean 
judgment  of  91  citizens  who  judged  the  same  highway  segments  by  rating  the 
film  strips,  or  by  rating  or  ranking  single  frames  from  the  film  strips. 
The  criterion  for  safety  was  the  accident  rate  for  each  highway  segment 
averaged  over  7  years.  The  criterion  for  capacity  was  the  figure 
calculated  by  using  the  procedure  from  the  Highway  Capacity  Manual  1965 
(Highway  Research  Board,  1965).  Each  expert  devoted  roughly  20  hours  to 
the  nine  sessions,  each  of  which  was  separated  by  two-week  intervals.  (See 
Hammond,  Hamm,  Grassia  &  Pe arson,  1984,  for  details.) 
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APPENDIX  B 

Correction  for  Attenuation 

To  use  the  Intra-ecologlcal  correlations  to  estimate  discriminant 
validity  accurately  In  the  external  validity  analysis,  two  new  procedures 
are  described: 

1.  Comparison  of  each  heteroconcept  correlation  with  the 

corresponding  Intra-ecologlcal  correlation. 

2.  Comparison  of  the  order  of  pairwise  heteroconcept  correlations 
with  the  order  of  Intra-ecologlcal  correlations. 

These  procedures  risk  being  In  error  If  the  measures  Involved  In  one 
correlation  are  more  noisy  than  the  measures  Involved  in  another,  because 
the  true  correlation  of  the  noisily  measured  concepts  would  be 
underestimated.  We  would  normally  correct  for  such  attenuation,  using  the 
formula: 

r{a,b) 

rc(a,b)  =  - - 

sqrt(r(a,a)*r(b,b)) 


t 


e 


where  r{a,b)  is  the  correlation  between  the  measures  of  concepts  a  and  b, 
rc(a,b)  is  the  correlation  corrected  for  attenuation,  and  r(a,a)  Is  the 
reliability  of  the  measure  of  a. 


i 
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We  have  not  corrected  for  attenuation  In  the  Illustrative  analysis  we 
present  here  because  the  reliabilities  were  not  measured  In  the  study  by 
Hammond,  Hamm,  Grassla,  and  Pearson  (1984).  Although  estimation  procedures 
for  the  reliabilities  of  the  engineers'  judgments  were  used  In  creating  the 
Internal  validity  matrix  (see  Footnote  1),  we  hesitate  to  use  these 
estimates  in  the  above  formula  because  the  product  would  be  an  "estimate  of 
an  estimate".  Also,  the  reliability  of  the  criterion  measures  can  not  be 
similarly  estimated  because,  for  example,  the  capacity  criterion  was 
produced  from  a  formula  and  thus  has  no  measurement  error,  though  It  might 
still  be  "in  error"  In  that  the  formula  could  be  wrong. 

Because  of  these  problems  with  the  measurement  of  reliability,  the 
comparisons  Involved  In  producing  external  discriminant  validity  measures  2 
through  5  use  correlations  that  have  not  been  corrected  for  attenuation. 
What  are  the  possible  effects  of  this? 

1.  If  the  amount  of  noise  Is  Identical  for  the  judgments  and  the 
criterion  measures,  there  is  no  problem;  if  (as  Is  more  likely) 
there  is  less  noise  in  the  criterion  measures  than  in  the 
engineer's  judgments,  then  In  testing  Hypothesis. 8  we  will  have 
underestimated  the  extent  to  which  the  engineers  underdiscriminate 
among  the  concepts.  Further,  the  measures  of  EDV2  and  EDV3  will 
be  especially  noisy. 

2.  If  the  concepts  are  judged  or  measured  with  equal  amounts  of 
noise,  then  we  have  no  problem  in  comparing  them;  If  on  the  other 
hand  one  concept  Is  judged  or  measured  with  more  noise  than 
another,  then  the  comparison  of  the  patterns  in  the  heteroconcept 
triangles  In  Hypotheses  Cl  and  C2  may  be  distorted. 
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To  avoid  such  problems,  it  is  important  in  planning  research  using  the 
multiconcept  multimethod  methodology  to  directly  measure  the  reliability  of 
each  judgment  and  each  criterion  measure,  if  possible. 
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APPENDIX  C 

Further  Measures  of  Internal  and  External  Discriminant  Validity 

This  appendix  explains  and  demonstrates  the  second,  third  and  fourth 
measures  of  internal  discriminant  validity  and  the  fifth  measure  of 
external  discriminant  validity. 

Internal  Discriminant  Validity 

The  second  measure  compares  the  correlations  on  the  reliability 
(monoconcept,  monomethod)  diagonal  (see  Table  3)  with  the  correlations  in 
the  heteroconcept  monomethod  triangle.  The  third  measure  compares  the 
correlations  on  the  validity  (monoconcept  heteromethod)  diagonal  with  the 
correlations  in  the  heteroconcept  monomethod  triangle.  The  results  of 
these  measures  with  respect  to  Hypothesis  4  were  identical  to  those 
determined  by  the  first  internal  discriminant  validity  measure:  the  safety 
and  capacity  concept  pair  was  least  discriminate. 

The  fourth  internal  discriminant  validity  method,  originally  suggested 
by  Campbell  and  Fiske  (1959)  in  the  passage  quoted  above,  examines  whether 
the  correlations  between  judgments  of  different  pairs  of  concepts  have  the 
same  pattern  regardless  of  the  methods  used  in  making  the  judgments. 

Each  of  the  9  heteroconcept  triangles  contains  correlations  between 
judgments  of  each  of  the  three  possible  pairs  of  concepts:  aesthetics  and 
safety  (ES),  aesthetics  and  capacity  (EC),  and  safety  and  capacity  (SC). 
There  are  six  possible  ways  in  which  these  correlations  may  be  ordered. 
Similarity  of  the  pattern  of  correlations  in  all  nine  heteroconcept 
triangles  is  evidence  that  for  this  set  of  concepts,  this  set  of  methods 
provides  discriminant  validity.  The  distribution  of  the  heteroconcept 
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triangles  among  these  orders  can  be  tested  with  Chi-square  against  the 
expectation  that  1.5  triangles  would  exhibit  each  of  the  6  orders  (cf. 
Del ucchl ,  1983). 

For  example.  In  the  Internal  validity  matrix  for  the  artificial 
engineer  (Table  3),  there  are  4  triangles  with  correlations  in  the  order 
SC  >  ES  >  EC,  2  triangles  with  SC  >  EC  >  ES,  2  with  ES  >  SC  >  EC,  and  1 
with  EC  >  SC  >  ES.  The  Chi-square  for  the  artificial  engineer  is  not 
significant  (Chi -squared  *  7.667,  df  *  5,  NS),  and  there  is  therefore  no 
evidence  for  discriminant  validity  with  this  procedure,  for  the  artificial 
engineer.  For  all  engineers: 

HC1:  There  is  no  predominant  pattern  among  the  hetero-concept 
correlations. 

The  analysis  was  carried  out  for  each  of  the  20  engineers.  Six 
engineers  deviated  significantly  from  the  expected  distribution;  that  is, 
showed  evidence  for  discriminant  validity.  Four  of  these  had  the  order 
SC  >  ES  >  EC. 

External  Discriminant  Validity 

The  availability  of  the  criterion  measures  and  their  intercorrelations 
allows  us  to  look  more  directly  at  the  question  that  was  asked  in 
Hypothesis  Cl  concerning  the  relative  sizes  of  the  correlations  In  the 
heteroconcept  triangles.  We  will  present  this  analysis  using  only  data 
from  the  internal  validity  matrix;  a  parallel  analysis  could  be  done  with 
data  from  the  external  validity  matrix. 
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The  correlation  between  the  aesthetics  and  capacity  criterion  measures 
(.279)  Is  larger  than  the  correlation  between  aesthetics  and  safety  (.275), 
which  In  turn  Is  larger  than  the  correlation  between  safety  and  capacity 
(.180).  Accurate  discriminant  validity  would  require  that  this 
EC  >  ES  >  SC  pattern  occur  In  each  heteroconcept  triangle.  (Note,  however, 
that  since  the  EC  correlation  Is  almost  Identical  to  the  ES  correlation  In 
this  particular  data  set,  the  ES  >  EC  >  SC  pattern  would  also  be  expected 
to  occur  often.)  Our  hypothesis  Is: 

HC2:  Engineers'  heteroconcept  correlations  have  the  same  pattern 
as  the  criterion  Intercorrelatlons. 


The  null  hypothesis  Is  the  same  as  for  Hypothesis  Cl.  To  Illustrate 
the  analysis  of  this  hypothesis,  none  of  the  artificial  engineers' 
heteroconcept  triangles  exhibited  the  expected  patterns  EC  >  ES  >  SC  or 
ES  >  EC  >  SC.  The  Chi-square  test  was  used  to  determine,  for  each  engineer 
Individually,  whether  significantly  more  of  his  nine  heteroconcept 
triangles  had  the  expected  pattern  EC  >  ES  >  SC  or  Its  easily  confused 

competitor  ES  >  EC  >  SC.  This  Is,  of  course,  a  more  stringent  test  than 

for  HC1.  It  was  found  that  for  only  one  engineer  was  the  EC  >  ES  >  SC 
pattern  predominant,  and  even  this  was  not  statistically  significant.  In 
fact,  the  reverse  patterns  were  most  coninon  —  eight  engineers  had 

SC  >  ES  >  EC,  and  7  had  SC  >  EC  >  ES. 

The  fourth  measure  of  Internal  discriminant  validity,  applied  in 
testing  Hypothesis  Cl,  and  the  fifth  measure  of  external  discriminant 

validity,  applied  to  Hypothesis  C2,  did  not  reveal  any  evidence  for 
discriminant  validity  In  this  study.  This  contrasts  with  the  findings 
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using  the  other  discriminant  validity  measures.  Athough  an  explanation  is 
available  (the  engineers  judge  safety  and  capacity  to  be  more  similar  to 
each  other  than  either  is  to  aesthetics,  when  in  fact  aesthetics  is  more 
closely  related  to  each  than  they  are  to  each  other),  still  it  is  clear 
that  putting  requirements  on  the  pattern  of  heteroconcept  correlations 
represents  a  stricter  test  of  discriminant  validity  than  the  other 
procedures  that  Campbell  and  Fiske  (1959)  suggested  for  measuring  it. 
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APPENDIX  D 

Procedure  for  Producl ng  Indices  of  Val  idlty 

The  various  Indices  (e.g.,  of  Internal  discriminant  validity,  external 
validity,  or  overall  validity)  are  produced  by  taking  the  mean  of  the 
appropriate  subindices  (e.g.,  the  first  measure  of  Internal  discriminant 
validity,  or  external  convergent  validity)  according  to  the  pattern 
Illustrated  In  Figure  A-l.  Each  subindex  Is  produced  for  each  engineer  by 
taking  the  mean  of  z-transformed  correlations,  from  specific  locations  In 
the  Internal  or  external  validity  matrices,  or  the  mean  of  the  differences 
between  such  z-transformed  correlations,  corresponding  to  the  comparisons 
that  were  Illustrated  above  with  Hypothesis  1-8.  Table  9  displays  the 
formulas  for  each  of  the  9  subindices,  at  each  of  6  possible  levels  of 
aggregation.  For  example,  the  formula  for  the  Internal  convergent  validity 
index,  at  the  concept  level  of  aggregation.  Is: 

M 

j,k  rm  m 
j  ne  k 

This  Index  Is  calculated  for  each  concept  m.  It  is  the  mean,  over  all 
pairs  of  methods  j  and  k  where  j  is  different  from  k,  of  the 
z-transformations  of  rm  m  ,  which  Is  the  correlation  between  two  judgments 
of  concept  m,  using  method  j  and  method  k.  The  correlations  for  the 
external  validity  matrix  are  (with  one  exception)  of  form  rmn  ;  that  is, 
the  correlation  between  the  criterion  measure  of  concept  m  and  the 
engineer's  judgment  of  concept  n  using  method  j.  M  is  used  as  a  "mean" 
symbol,  representing  a  sum  of  correlations  divided  by  the  number  of 
correlations  summed  over.  The  correlations  Involved  In  producing  all  the 
subindices  In  this  table  have  been  z-transformed. 
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Once  the  subindices  are  calculated  as  in  Table  9,  they  combined  as 
indicated  in  Figure  A-l.  Thus,  the  mean  of  the  three  internal  discriminant 
validity  subindices  { I DY 1 ,  IDV2,  and  IDV3)  is  the  index  for  internal 
discriminant  validity  { IDV) ;  the  mean  of  IDV  and  the  internal  convergent 
validity  index  (ICV)  is  the  index  for  coherence  or  internal  validity  (IV); 
and  the  mean  of  IV  and  the  index  for  performance  or  external  validity  (EV) 
is  the  index  for  overall  competence  (V). 

In  order  that  these  indices  be  on  a  common  scale,  in  which  the 
meanings  of  the  numbers  are  preserved  when  they  are  involved  in  the 
arithmetic  operations  of  calculating  means  and  differences,  the  indices 
consist  only  of  those  measures  of  reliability,  convergent  validity,  and 
discriminant  validity  that  are  correlations  or  differences  between 
correlations  (after  Fisher's  z-transformation  of  the  correlations). 
Therefore  the  procedures  used  for  testing  Hypotheses  Cl  and  C2  (in  Appendix 
C),  which  are  not  expressable  as  correlations,  are  not  included  in  this 
index.  Further,  the  second  and  third  external  discriminant  validity 
measures  used  here  are  the  absolute  val ues  of  the  differences  between  the 
engineer's  heteroconcept  correlation  and  the  corresponding  criterion 
intercorrelations  (which  addresses  accuracy),  while  relative  differences 
were  used  to  test  Hypothesis  7  (which  addressed  the  question  of  over-  or 
underdiscrimination).  Finally,  note  that  at  some  levels  of  aggregation 
specific  subindices  can  not  be  created.  For  example,  it  is  not  possible  to 
measure  convergent  validity  at  the  level  of  concept  pairs,  because 
convergent  validity  deals,  by  definition,  with  only  one  concept. 
Similarly,  it  is  not  meaningful  to  create  an  index  for  the  external 
validity  of  a  pair  of  judgment  methods,  for  the  external  validity  matrix 
deals  with  only  one  judgment  at  a  time.  (A  measure  was  possible  for  EDV3, 
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however,  because  It  Is  derived  from  the  Internal  validity  matrix.)  This 
means  that  the  Index  should  not  be  used  for  making  comparisons  between 
different  levels  of  aggregation. 

These  Indices  are  useful  for  a  number  of  purposes.  They  can,  for 
example,  provide  measures  for  evaluating: 

1.  Individual  engineers'  ability  to  discriminate  among  concepts  (use 
Individual  IDV  or  EDV  Indices  at  the  Overall  level  of  aggregation 
In  Table  9).  In  the  present  study,  the  engineers'  Individual 
Internal  discriminant  validity  Indices  range  from  .432  to  .894, 
and  their  external  discriminant  validity  Indices  range  from  -.083 
to  .101. 

2.  How  well  Individual  concepts  can  be  judged  (use  mean  V,  IV,  or  EV 
Indices  at  the  Concept  level  of  aggregation  In  Table  9).  In  the 
present  study,  aesthetics  is  judged  best  (internal  validity  =  .93, 
external  validity  =  .66),  safety  next  (Internal  validity  =  .49, 
external  validity  *  .23),  and  capacity  third  (internal 
validity  =  .45,  external  validity  =  .24). 

3.  How  well  pairs  of  concepts  can  be  discriminated  (use  IDV  or  EDV 
Indices  at  Concept  Pair  level  of  aggregation).  In  the  present 
study,  the  aesthetics  and  capacity  concepts  are  just  as  easily 
discriminate  (IDV  *  .83,  EDV  =  .07)  as  the  aesthetics  and  safety 
concepts  (IDV  *  .82,  EDV  a  .08);  safety  and  capacity  are  most 
readily  confused  (IDV  =  .32,  EDV  =  -.16). 
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4.  How  well  specific  methods  work  (use  indices  at  the  Method  level  of 

aggregation).  Both  internal  and  external  validity  show  that 

analysis  is  the  best  method  for  judging  these  concepts  (internal 

validity  =  .80,  external  validity  =  .46),  quasi  rationality  next 
(internal  validity  =  .61,  external  validity  =  .45),  and  intuition 
third  (internal  validity  =  .44,  external  validity  =  .23). 

5.  How  well  pairs  of  methods  work  (use  indices  at  the  Method  Pair 

level  of  aggregation).  Consistent  with  the  previous  result,  in 
case  one  wished  to  use  only  two  of  the  three  methods  on  a  future 
project,  one  would  choose  the  quasi -rational  and  analytical 
methods  (IV  =  .63,  EV  *  -.21)  rather  than  the  intuitive  and 

quasi -rational  (IV  *  .35,  EV  =  -.23)  or  the  intuitive  and 

analytical  (IV  =  .36,  EV  =  -.24)  methods. 
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